Search CORE

61 research outputs found

Community detection based on links and node features in social networks

Author: A. Pothen
D.M.. Blei
J. Xie
J.M. Kleinberg
M. Girvan
S.. Fortunato
S.C. Deerwester
Publication venue
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms

Crossref

OPUS - University of Technology Sydney

A Knowledge-Based Semantic Kernel for Text Classification

Author: D. Mavroeidis
G. Tsatsaronis
J. Demsar
R. Basili
R. Navigli
S.C. Deerwester
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract. Typically, in textual document classification the documents are represented in the vector space using the “Bag of Words ” (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a seman-tic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both seman-tic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers

CiteSeerX

Crossref

Cogito Componentiter - Ergo Sum

Author: A. Hyvarinen
A.J. Bell
A.L. Barabasi
C. Bishop
G. Salton
J. Wagensberg
L.K. Hansen
M. Lewicki
P. Hoyer
S. Haykin
S.C. Deerwester
T. Kolenda
T.K. Landauer
Publication venue
Publication date: 01/01/2006
Field of study

Crossref

Online Research Database In Technology

Kernel Spectral Clustering and applications

In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane in a high dimensional space induced by a kernel. In addition, the multi-way clustering can be obtained by combining a set of binary decision functions via an Error Correcting Output Codes (ECOC) encoding scheme. Because of its model-based nature, the KSC method encompasses three main steps: training, validation, testing. In the validation stage model selection is performed to obtain tuning parameters, like the number of clusters present in the data. This is a major advantage compared to classical spectral clustering where the determination of the clustering parameters is unclear and relies on heuristics. Once a KSC model is trained on a small subset of the entire data, it is able to generalize well to unseen test points. Beyond the basic formulation, sparse KSC algorithms based on the Incomplete Cholesky Decomposition (ICD) and

L_0

L_1, L_0 + L_1

, Group Lasso regularization are reviewed. In that respect, we show how it is possible to handle large scale data. Also, two possible ways to perform hierarchical clustering and a soft clustering method are presented. Finally, real-world applications such as image segmentation, power load time-series clustering, document clustering and big data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms

arXiv.org e-Print Archive

Crossref

Formal Concept Analysis (FCA) begins from a context, given as a binary relation between some objects and some attributes, and derives a lattice of concepts, where each concept is given as a set of objects and a set of attributes, such that the first set consists of all objects that satisfy all attributes in the second, and vice versa. Many applications, though, provide contexts with quantitative information, telling not just whether an object satisfies an attribute, but also quantifying this satisfaction. Contexts in this form arise as rating matrices in recommender systems, as occurrence matrices in text analysis, as pixel intensity matrices in digital image processing, etc. Such applications have attracted a lot of attention, and several numeric extensions of FCA have been proposed. We propose the framework of proximity sets (proxets), which subsume partially ordered sets (posets) as well as metric spaces. One feature of this approach is that it extracts from quantified contexts quantified concepts, and thus allows full use of the available information. Another feature is that the categorical approach allows analyzing any universal properties that the classical FCA and the new versions may have, and thus provides structural guidance for aligning and combining the approaches.Comment: 16 pages, 3 figures, ICFCA 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

University of Twente Research Information

An Empirical Comparison of Text Categorization Methods

Author: B. Masand
C. Cortes
C.J.C. Burges
F. Sebastiani
G. Salton
G. Salton
G. Salton
N. Cristianini
R. Baeza-Yates
R.M. Creecy
S.C. Deerwester
T. Joachims
T. Joachims
V. Vapnik
W.S. Cooper
Y. Yang
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Crossref

Recommended from our members

Parallel computing in information retrieval - An updated review

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed

City Research Online

Crossref

A Linear-Algebraic Technique with an Application in Semantic Image Retrieval

Author: A. Yavlinsky
C. Tomasi
F. Monay
J.S. Hare
P. Duygulu
P.G.B. Enser
S.C. Deerwester
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

This paper presents a novel technique for learning the underlying structure that links visual observations with semantics. The technique, inspired by a text-retrieval technique known as cross-language latent semantic indexing uses linear algebra to learn the semantic structure linking image features and keywords from a training set of annotated images. This structure can then be applied to unannotated images, thus providing the ability to search the unannotated images based on keyword. This factorisation approach is shown to perform well, even when using only simple global image features

CiteSeerX

Southampton (e-Prints Soton)

Crossref